NSF PAR Search | NSF Public Access Repository

Note: When clicking on a Digital Object Identifier (DOI) number, you will be taken to an external site maintained by the publisher. Some full text articles may not yet be available without a charge during the embargo (administrative interval).
What is a DOI Number?

Some links on this page may take you to non-federal websites. Their policies may differ from this site.

Data to Infinity and Beyond: Examining Data Sharing and Reuse Practices in the Computer Security Community

https://doi.org/10.1109/SP61157.2025.00180

Crowder, Anna; Lu, Allison; Childs, Kevin; Stillman, Carson; Traynor, Patrick; Butler, Kevin RB (May 2025, IEEE)

Sharing high-quality research data specifically for reuse in future work helps the scientific community progress by enabling researchers to build upon existing work and explore new research questions without duplicating data collection efforts. Because current discussions about research artifacts in Computer Security focus on reproducibility and availability of source code, the reusability of data is unclear. We examine data sharing practices in Computer Security and Measurement to provide resources and recommendations for sharing reusable data. Our study covers five years (2019–2023) and seven conferences in Computer Security and Measurement, identifying 948 papers that create a dataset as one of their contributions. We analyze the 265 accessible datasets, evaluating their under-standability and level of reuse. Our findings reveal inconsistent practices in data sharing structure and documentation, causing some datasets to not be shared effectively. Additionally, reuse of datasets is low, especially in fields where the nature of the data does not lend itself to reuse. Based on our findings, we offer data-driven recommendations and resources for improving data sharing practices in our community. Furthermore, we encourage authors to be intentional about their data sharing goals and align their sharing strategies with those goals.
more » « less
Free, publicly-accessible full text available May 12, 2026
I Can Show You the World (of Censorship): Extracting Insights from Censorship Measurement Data Using Statistical Techniques

https://doi.org/10.1109/ACSAC63791.2024.00091

Crowder, Anna; Olszewski, Daniel; Traynor, Patrick; Butler, Kevin_R B (December 2024, IEEE)

In response to the growing sophistication of censor- ship methods deployed by governments worldwide, the existence of open-source censorship measurement platforms has increased. Analyzing censorship data is challenging due to the data’s large size, diversity, and variability, requiring a comprehensive under- standing of the data collection process and applying established data analysis techniques for thorough information extraction. In this work, we develop a framework that is applicable across all major censorship datasets to continually identify changes in cen- sorship data trends and reveal potentially unreported censorship. Our framework consists of control charts and the Mann-Kendall trend detection test, originating from statistical process control theory, and we implement it on Censored Planet, GFWatch, the Open Observatory of Network Interference (OONI), and Tor data from Russia, Myanmar, China, Iran, T ¨ urkiye, and Pakistan from January 2021 through March 2023. Our study confirms results from prior studies and also identifies new events that we validate through media reports. Our correlation analysis reveals minimal similarities between censorship datasets. However, because our framework is applicable across all major censorship datasets, it significantly reduces the manual effort required to employ multiple datasets, which we further demonstrate by applying it to four additional Internet outage-related datasets. Our work thus provides a tool for continuously monitoring censorship activity and acts as a basis for developing more systematic, holistic, and in-depth analysis techniques for censorship data.
more » « less
Full Text Available
"I Had Sort of a Sense that I Was Always Being Watched...Since I Was": Examining Interpersonal Discomfort From Continuous Location-Sharing Applications

https://doi.org/10.1145/3658644.3690342

Childs, Kevin; Gibson, Cassidy; Crowder, Anna; Warren, Kevin; Stillman, Carson; Redmiles, Elissa M; Jain, Eakta; Traynor, Patrick; Butler, Kevin_R B (December 2024, ACM)

Full Text Available
"Better Be Computer or I'm Dumb": A Large-Scale Evaluation of Humans as Audio Deepfake Detectors

https://doi.org/10.1145/3658644.3670325

Warren, Kevin; Tucker, Tyler; Crowder, Anna; Olszewski, Daniel; Lu, Allison; Fedele, Caroline; Pasternak, Magdalena; Layton, Seth; Butler, Kevin; Gates, Carrie; et al (December 2024, ACM)

Audio deepfakes represent a rising threat to trust in our daily communications. In response to this, the research community has developed a wide array of detection techniques aimed at preventing such attacks from deceiving users. Unfortunately, the creation of these defenses has generally overlooked the most important element of the system - the user themselves. As such, it is not clear whether current mechanisms augment, hinder, or simply contradict human classification of deepfakes. In this paper, we perform the first large-scale user study on deepfake detection. We recruit over 1,200 users and present them with samples from the three most widely-cited deepfake datasets. We then quantitatively compare performance and qualitatively conduct thematic analysis to motivate and understand the reasoning behind user decisions and differences from machine classifications. Our results show that users correctly classify human audio at significantly higher rates than machine learning models, and rely on linguistic features and intuition when performing classification. However, users are also regularly misled by pre-conceptions about the capabilities of generated audio (e.g., that accents and background sounds are indicative of humans). Finally, machine learning models suffer from significantly higher false positive rates, and experience false negatives that humans correctly classify when issues of quality or robotic characteristics are reported. By analyzing user behavior across multiple deepfake datasets, our study demonstrates the need to more tightly compare user and machine learning performance, and to target the latter towards areas where humans are less likely to successfully identify threats.
more » « less
Full Text Available

Search for: All records